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Abstract 

In this paper, we study the scheduling problem for downlink transmission in a multi-channel (e.g., 
OFDM-based) wireless network. We focus on a single cell, with the aim of developing a unifying frame- 
work for designing low-complexity scheduling poUcies that can provide optimal performance in terms of 
both throughput and delay. We develop new easy-to-verify sufficient conditions for rate-function delay 
optimality (in the many-channel many-user asymptotic regime) and throughput optimaUty, respectively. 
The sufficient conditions allow us to prove rate-function delay optimality for a class of Oldest Packets First 
(OPF) policies and throughput optimality for a large class of Maximum Weight in the Fluid limit (MWF) 
policies, respectively. By exploiting the special features of our carefully chosen sufficient conditions and 
intelligently combining policies from the classes of OPF and MWF policies, we design hybrid policies 
that are both rate-function delay-optimal and throughput-optimal with a complexity of O(n^-^logn), 
where n is the number of channels or users. Our approach yields significantly lower complexity than 
the only previously known delay (rate-function) and throughput optimal scheduUng policy, which incurs 
a high complexity of 0{n^). We also conduct numerical experiments to validate our theoretical results. 

I. Introduction 

Designing high-perfomiance scheduling algorithms has been a vital and challenging problem in wireless 
networks. Among the many dimensions of network performance, the most critical ones are perhaps 
throughput, delay, and complexity. However, it is in general extremely difficult, if not impossible, to 
develop scheduling poUcies that attain the optimal performance in terms of both throughput and delay, 
without the cost of high complexity HI. 

In this paper, we focus on the setting of a single-hop multi-user multi-channel system. A practically 
important example of such a multi-channel system is the downlink of a single cell in 4G OFDM-based 
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celluar networks (e.g., LTE and WiMax). Such a system typically has a large bandwidth that can be 
divided into multiple orthogonal sub-bands (or channels), which need to be allocated to a large number 
of users by a scheduling algorithm. The main question that we will attempt to answer in this paper is the 
following: How do we design efficient scheduling algorithms that simultaneously provide high throughput, 
small delay, and low complexity? 

We assume that the multi-channel system has n channels and a proportionally large number of users. 
This setting is referred to as the many-channel many-user asymptotic regime, when n becomes large. The 
connectivity between each user and each channel is assumed to be time-varying, due to channel fading. 
We assume that the base station (BS) maintains a separate First-in First-out (FIFO) queue that buffers 
the packets destined to each user. The delay metric that we will focus on in this paper is the asymptotic 
decay-rate of the probability that the largest packet waiting time in the system exceeds a fixed threshold, 
as both the number of channels and the number of users become large. (Refer to Eq. Q for the precise 
definition.) This decay-rate is also called the rate-function in the large-deviations theory. 

There are a few recent works that consider a similar system to ours. In 111, the authors studied 
a minimization problem of queue-length-based cost functions over a finite horizon. The cost function 
considered there is convex, strictly increasing, and includes the expected total queue length as a special 
case. The authors showed that their defined delay optimality can be achieved in certain special cases. 
More recently, a number of queue-length-based scheduling policies Il3l-|l6l have been developed to achieve 
small queue-lengths in the many-channel many-user asymptotic regime. Further, an optimal scheduling 
policy that maximizes the queue-length-based rate-functioij^ has been derived with complexity 0{n^) 
|[6ll . However, this line of work has two key limitations. First, the schedulers' performance are proven 
under the assumption that the arrival process is i.i.d. not only across users, but also in time, which does 
not model the temporal correlation present in most real network traffic. More importantly, it is well 
known that good queue-length performance does not necessarily translate to good delay performance ||7], 
iSl . A recently developed scheduling policy called Delay Weighted Matching (DWM) Q, which makes 
scheduling decisions by maximizing the sum of the delays of the packets scheduled in each time-slot, has 
been shown to be both throughput-optimal and rate-function delay-optimal. However, the main drawback 
of DWM is that it results in a very high complexity of O(n^), and is hence not amenable for practical 
implementations . 

'The queue-length-based rate-function is defined as the asymptotic decay-rate of the probabiUty that the largest queue length 
in the system exceeds a fixed threshold. 
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Hence, the state-of-the-art does not satisfactorily answer our main question of how to design scheduling 
policies with a low complexity, while guaranteeing provable optimality for both throughput and delay. 
In this paper, we address this challenge, and provide the following key intellectual contributions. 

First, we characterize easy-to-verify sufficient conditions for rate-function delay optimality in the many- 
channel many-user asymptotic regime and for throughput optimality in general non-asymptotic settings. 
The sufficient conditions allow us to prove rate-function delay optimality for a class of Oldest Packets 
First ( OPF) policies and throughput optimality for a large class of Maximum Weight in the Fluid limit 
(MWF) poUcies. 

Second, we develop an 0(n^'^ log n)-complexity scheduling policy called DWM-n. DWM-n shares 
high-level similarities with DWM policy, but makes scheduling decisions in each time-slot by maximizing 
the sum of the delays of the scheduled packets over only the n oldest packets in the system, rather than 
over all the packets as in DWM policy Q. We show that DWM-n is an OPF policy and is thus rate- 
function delay-optimal. However, DWM-n is not throughput-optimal in general, and may perform poorly 
when n is not large. 

Third, by exploiting the special features of our carefully chosen sufficient conditions and intelligently 
combining policies from the classes of OPF and MWF policies, we develop a class of two-stage hybrid 
policies that simultaneously achieve both rate-function delay optimality and throughput optimality. The 
basic idea is as follows. In stage 1, we choose an OPF policy and focus on scheduling the n oldest 
packets. This not only guarantees rate-function delay optimality, but also satisfies the sufficient condition 
for throughput optimality for all selected servers in stage 1. In stage 2, the selected servers in stage 1 will 
not be considered. For the remaining servers, we run a policy from the class of MWF policies. Since the 
chosen MWF policy is run over the remaining servers that were not selected in stage 1, it ensures that 
the sufficient condition for throughput optimality is satisfied for these remaining servers. Further, since 
the packets and servers matched in stage 1 are not touched, the satisfaction of the sufficient condition 
for rate-function delay optimality is not perturbed. In particular, we can adopt DWM-n policy in stage 1 
and the Delay-based Max Weight Scheduling (D-MWS) policy in stage 2, respectively, so as to design a 
hybrid policy with a low complexity of 0{n?'^ log n). 

Finally, we conduct numerical experiments to validate our theoretical results in different scenarios. 

II. System Model 

We consider a multi-channel system with n orthogonal channels and n users, which can be modeled as a 
multi-queue multi-server system with stochastic connectivity, as shown in Fig. [T] For ease of presentation, 
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Fig. 1. System model. The connectivity between each pair of queue Qi and server Sj is "ON" (denoted by a solid line) with 
probability q, and "OFF" (denoted by a dashed line) otherwise. 

the number of users is assumed to be equal to the number of channels. Our analysis for rate-function 
delay optimality follows similarly if the number of users scales linearly with the number of channels. 
Throughout the rest of the paper, we will use the terms "user" and "queue" interchangeably, and use the 
terms "channel" and "server" interchangeably. We assume that time is slotted. In a time-slot, a server 
can be allocated to only one queue, but a queue can get service from multiple servers. The connectivity 
between queues and servers is time-varying, i.e., it can change between "ON" and "OFF" from time to 
time. We assume that perfect channel state information (i.e., whether each channel is ON or OFF for 
each user in each time-slot) is known at the BS. This is a reasonable assumption in the downlink scenario 
of a single cell in a multi-channel cellular system with dedicated feedback channels. 

The notations used in this paper are as follows. We let Qi denote the FIFO queue (at the BS) associated 
with the i-th user, and let Sj denote the j-th server. We assume infinite buffer for all the queues. Let Ai{t) 
denote the number of packet arrivals to queue Qi in time-slot t, and let Aj be the mean arrival rate of 
queue Qi. We then let A = [Ai, A2, • • • , A„] denote the arrival rate vector. We assume that packet arrivals 
occur at the beginning of each time-slot, and packet departures occur at the end of each time-slot. We 
let Qi{t) denote the length of queue Qi at the beginning of time-slot t immediately after packet arrivals. 
Also, let Zi^i{t) denote the delay (i.e., waiting time) of the l-t\\ packet of queue Qi at the beginning of 
time-slot t, which is measured since the time when the packet arrived to queue Qi until the beginning of 
time-slot t. Note that at the end of each time-slot, the packets still present in the system will have their 
delays increased by one due to the elapsed time. We then let Wi{t) = Zi i{t) denote the HOL packet 
delay of queue Qi at the beginning of time-slot t. Further, we use Cij{t) to denote the capacity of the 
link between queue Qi and server Sj in time-slot t, i.e., the maximum number of packets that can be 
served by server Sj from queue Qi in time-slot t. Finally, we define D{x\ \y) = x log - + (1 — x) log 
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and (x)"*" = max(x,0), and let Ij.j denote the indicator function. 

We now state the assumptions on the arrival processes. The throughput analysis is carried out under 
very general conditions (Assumption [1} similar to that of [9]. 

Assumption 1: For each user i € {1, 2, ... , n}, the arrival process Ai{t) is an irreducible and positive 
recurrent Markov chain with countable state space, and satisfies the Strong Law of Large Numbers: That 
is, with probability one, 

t— >-oo t 

We also assume that the arrival processes are mutually independent across users (which can be relaxed 
for showing throughput optimality, as discussed in ||9]-) 

Assumptions |2] and [3] (stated below) have been used in the previous work IT] for rate-function delay 
analysis. 

Assumption 2: There exists a finite L such that Ai{t) < L for any i and t, i.e., instantaneous arrivals 
are bounded. 

Assumption 3: The arrival processes are i.i.d. across users, and Xi = p for any user i. Given any e > 
and 6 > 0, there exists T > 0, > 0, and a positive function /^(e, 6) independent of n and t such that 

P I > d I < exp(-nt/fi(e,(5)), 

for all t > T and n> N. 

As mentioned in Q, Assumptions [2] and [3] are relatively mild. Assumption |2] requires that the arrivals 
in each time-slot have bounded support. Assumption |3] allows the arrivals for each user to be correlated 
over time (e.g., arrivals driven by a two-state Markov chain), which is more general than the arrival 
processes (i.i.d. in time) considered in ||2l-||6l. 

We then describe our channel model as follows. 

Assumption 4: In any time-slot t, Cij{t) is modeled as a Bernoulli random variable with a parameter 
q G (0,1), i.e., 

1, with probability g, 
0, with probability \ — q. 
All the random variables Cij{t) are assumed to be mutually independent across all the variables i,j and 
t. 

We assume unit channel capacity for ease of exposition only, and our analysis holds similarly for the 
general ON-OFF channel model. Under this assumption, we will also let Cij{t) denote the connectivity 
between queue Qi and server Sj in time-slot t, without causing any confusion. As in the previous works 
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ll2l-|lll> Il21> we assume i.i.d. channels for the analytical results in this paper. The sub-bands being i.i.d. is a 
reasonable assumption when the channel width is larger than the coherence bandwidth of the environment. 
Moreover, we will show through simulations that our proposed low-complexity solution also performs 
well in more general scenarios, e.g., accounting for heterogeneous channels that are correlated over time. 
Further, we will briefly discuss how to generalize our solution to more general scenarios towards the end 
of this paper. 

Next, we define the optimal throughput region (or stability region) of the system for any fixed integer 
n > 0. As in [9], a stochastic queueing network is said to be stable if it can be described as a discrete- 
time countable Markov chain and the Markov chain is stable in the following sense: The set of positive 
recurrent states is nonempty, and it contains a finite subset such that with probability one, this subset is 
reached within finite time from any initial state. When all the states communicate, stability is equivalent 
to the Markov chain being positive recurrent. The throughput region of a scheduling policy is defined as 
the set of arrival rate vectors for which the network remains stable under this policy. Further, the optimal 
throughput region is defined as the union of the throughput regions of all possible scheduling policies. 
We let A* denote the optimal throughput region. A scheduling policy is throughput-optimal, if it can 
stabilize any arrival rate vector A strictly inside A*. For more discussions on the characterization of A* 
please refer to Appendix |A] 

For delay analysis, we consider the many-queue many-server asymptotic regime. Let W{t) denote the 
largest HOL delay over all the queues (i.e., the largest or worst packet waiting time in the system) at 
the beginning of time-slot t, i.e., W{t) = maxi<j<„ Assuming that the system is stationary and 

ergodic, we define the rate-function for integer threshold 6 > as 

1(5) = lim — logP(VF(0) > h). (2) 

We can then estimate P(VF(0) > 6) ?a exp(— n/(6)) when n is large, and the estimation accuracy tends 
to be higher as n increases. For large n, it is clear that a larger value of the rate-function leads to better 
delay performance, i.e., a smaller probability that the largest HOL delay exceeds a certain threshold. A 
scheduling policy is rate-function delay-optimal if it achieves the maximum rate-function over all possible 
scheduling policies, for any fixed integer threshold 6 > 0. 

Note that the rate-function optimality is studied in the asymptotic regime, i.e., when n goes to infinity. 
Although the convergence of the rate-function is typically fast, the throughput performance may be poor 
for small to moderate values of n. As a matter of fact, a rate-function delay-optimal policy may not even 
be throughput-optimal for a fixed n (e.g., the DWM-n policy that we will propose in Section Hill) . To 
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that end, we are interested in designing scheduling poUcies that maximize both the throughput (for any 
fixed n) and the rate-function (in the many-queue many-server asymptotic regime). 

III. Sufficient Conditions 

In 13L the authors proposed the DWM policy that is both throughput-optimal and rate-function delay- 
optimao However, its high complexity O(n^) renders it impractical. Hence, the big challenge is to find 
low-complexity scheduling policies that are both throughput-optimal and rate-fiinction delay-optimal. 
To that end, in this section, we first characterize easy-to-verify sufficient conditions for rate-function 
delay optimality in the many-queue many-server asymptotic regime and for throughput optimality in 
non-asymptotic settings. We then develop classes of policies that satisfy the sufficient condition for 
rate-function delay optimality (called the Oldest Packets First (OPF) pohcies) and throughput optimality 
(called the Maximum Weight in the Fluid limit (MWF) polices), respectively. 

As discussed in the introduction, our ultimate goal is to develop low-complexity hybrid policies that 
are both rate-function delay-optimal and throughput-optimal. However, it is unclear that, just because 
one policy is rate-function delay-optimal and another one is throughput-optimal, their combinations will 
necessarily yield the right hybrid policy that is optimal in terms of both throughput and delay. As we 
will discuss further at the beginning of Section |IVl our carefully chosen sufficient conditions possess 
some special features that allow us to construct a low-complexity hybrid policy that is both rate-function 
delay-optimal and throughput-optimal. 

A. Rate-function Delay-optimality 

We start by presenting the main result of this section in the following theorem, which provides a 
sufficient condition for scheduling policies to be rate-function delay-optimal. 

Theorem 1: Under Assumptions |2] and [3j a scheduling policy P is rate-function delay-optimal if in 
any time-slot, policy P can serve the k oldest packets in that time-slot for the largest possible value of 
/c € {1,2, . . . ,n}. 

Definition 1: A scheduling policy P is said to be in the class of Oldest Packets First (OPF) policies 
if policy P satisfies the sufficient condition in Theorem [T] 

It is clear from Theorem [T] that the class of OPF policies are all rate-function delay-optimal. We 
provide the proof of Theorem [T] in Appendix iBl and give the intuition behind it as follows. It is easy to 

^Although the delay metric in Q is slightly different from ours, both metrics are closely related. Moreover, the analysis in 
(7) is also applicable to prove delay-optimality of DWM for our rate-function as in (|2). 



January 17, 2013 



DRAFT 



8 



see that the First-come First-serve (FCFS) poUcy, which serves the oldest packets first, is (sample-path) 
delay-optimal in a single-queue single-server system. Also, it is not hard to see that for a multi-queue 
multi-server system with fi^ll connectivity, where all pairs of queue and server are connected, a policy 
that chooses to serve the oldest packets (over the whole system) first is delay-optimal. These motivate us 
to ask a natural and interesting question: ;/ a policy chooses to serve the oldest packets first in a multi- 
queue multi-server system with time-varying and partial connectivity (as we consider in this paper), does 
it achieve rate-function delay optimality? Note that in such a system, at most n packets can be served in 
each time-slot. Hence, if in each time-slot a policy can serve all the n oldest packets in the system (as in 
the case with full connectivity), this policy should yield optimal delay performance. However, due to the 
random connectivity between queues and servers, no policy may be able to do so. Hence, we propose 
a class of policies that choose to serve the k oldest packets for the largest possible value of k. In other 
words, for any A; E {1, 2, ... , n}, if the k oldest packets can be served by some scheduling policy, then 
our proposed policies will serve these k packets too. 

On the other hand, the authors of |f7l| proposed the Frame Based Scheduling (FBS) policy, and showed 
that the FBS policy with an appropriately chosen operating parameter h is rate-function delay-optimal. 
We prove Theorem [T] by showing a dominance property of an OPF policy over FBS. Specifically, for 
any given sample path and for any value of h, by the end of any time-slot t, an OPF policy has served 
every packet that FBS has served. Details are available in Appendix IB] 

We would like to remark on the significance of Theorem [T] Although the FBS policy (with an 
appropriately chosen parameter h) is shown to be rate-function delay-optimal in [7], it is unclear how 
to choose the right value of h in practice. Hence, the FBS policy itself does not provide a verifiable 
condition for whether other policies are also rate-function delay-optimal. In contrast, the condition for 
the OPF policies in Theorem [T] is easy to verify, and can be readily used to design other rate-function 
delay-optimal policies. Specifically, Theorem [T] enables us to identify a new policy, called the DWM-n 
policy (which we will describe later), that is rate-function delay-optimal, and that substantially reduces the 
complexity to 0{v?'^ \ogn). This in turn allows us to design low-complexity hybrid scheduling policies 
that are both throughput-optimal and rate-fixnction delay-optimal (in Section JV]). 

Next, we review the Delay Weighted Matching (DWM) policy proposed in Q, which is also rate- 
function delay-optimal. DWM operates in the following way. In each time-slot t, define the weight of 
the /-th packet of Qi as Zi^i{t), i.e., the delay of this packet at the beginning of time-slot t, which is 
measured since the time when this packet arrived to queue Qi until time-slot t. Then, construct a bipartite 
graph G[X \JY, E] such that the vertices in X correspond to the n oldest packets from each of the n 
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queues and Y is the set of all servers. Thus, \X\ = v? and |y| = n. Let Xi Q X he. the set of packets 
from queue Qi. If queue Qi is connected to server Sj, then for each packet x ^ Xi, there is an edge 
between x and Sj in graph G and the weight of this edge is set to the weight of packet x. The schedule 
is then determined by a maximum-weight matching over G. In other words, DWM maximizes the sum 
of the delays of the packets scheduled. 

Note that the DWM policy belongs to the class of OPF policies (which is clear from the proof of 
Lemma 7 in Q). Although the DWM policy is rate-function delay-optimal, it suffers from high complex- 
ity, which renders it impractical. Specifically, DWM has a complexity of O(n^), since the complexity of 
finding a maximum- weight matching ifTOl over a bipartite graph ^[y, -E] is 0(|1/||£'| + |l/plog \ V\) in 
general, and the bipartite graph constructed by DWM has \V\ = 0{n?) and \E\ = 0{n^). 

To overcome the high-complexity issue, we develop a simpler policy that is also in the class of OPF 
policies (and is thus also rate-function delay-optimal), but has a much lower complexity of 0(?i^-^ logn). 
The new policy is called DWM-n due to the high-level similarity to DWM, but it exhibits critical 
differences when picking packets to construct the bipartite graph G[X UY, E] and finding the maximum- 
weight matching over G. The differences are as follows: 

1) In each time-slot, instead of considering the n oldest packets from each queue (thus packets 
in total) as in DWM, DWM-n considers only the n oldest packets in the whole system. Thus, the 
bipartite graph constructed by DWM-n has |X| = n and |y| = n. 

2) The rest of the operations of DWM-n are similar to that of DWM, i.e., the schedule is determined 
by a maximum-weight matching over G, except that DWM-n finds a maximum-weight matching 
based on the vertex weights. Such a maximum-weight matching is also called Maximum Vertex- 
weighted Matching (MVM) fTTl . fT2i . Specifically, the weight of each vertex in X is the weight 
of the corresponding packet (i.e., the delay of the packet, as defined in DWM), and the weight of 
each vertex in Y is set to 0. 

In the following proposition, we show that DWM-n policy is rate-function delay-optimal and has a 
low complexity. 

Proposition 2: DWM-n policy is an OPF policy, and is thus rate-function delay-optimal under As- 
sumptions |2] and [3] Further, DWM-n policy has a low complexity of O(n^'^logn). 

We provide the proof in Appendix |C] The fact that DWM-n policy is an OPF policy follows from a 
property of MVM ifTTI that if there exists a matching that matches all of the k heaviest vertices, then any 
MVM matches all of the k heaviest vertices as well. The low complexity of DWM-n follows immediately 
from the fact that DWM-n reduces the number of packets under consideration (n packets in total), and 
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that an MVM in an n x n bipartite graph can be found in 0{v?-^ log n) time lITll . Note that even if DWM 
pohcy adopts MVM when determining the schedule, its complexity can only be improved to 0(n^ logn). 

Although DWM-n policy achieves rate-function delay optimality with a low complexity, it may not 
be throughput-optimal in general. This is because DWM-n policy considers only the n oldest packets in 
the system. It is likely that certain servers may not be connected to any of the queues that contain these 
n packets, which results in the server being idle and thus a waste of service. Hence, DWM-77, is a lazy 
policy. In fact, we can construct a simple counter-example to show that DWM-n policy is, in general, 
not throughput-optimal as stated in Proposition |3] 

Proposition 3: DWM-n policy is not throughput-optimal in general. 

We prove Proposition |3] by constructing a special arrival pattern that forces certain servers to be idle, 
even when they can serve some of the queues. We provide the proof in Appendix |D] Proposition |3] 
suggests that a rate-function delay-optimal policy may not have good throughput performance (for a 
fixed n). This may appear counter-intuitive at the first glance. However, it should be noted that the rate- 
function delay optimality is studied in the asymptotic regime, i.e., when n goes to infinity. Although the 
convergence of the rate-function is typically fast, the throughput performance may be poor for small to 
moderate values of n. Our simulation results (Fig. |2(b)| in Section |V]) will provide further evidence of 
this. 

B. Throughput Optimality 

In this section, we present a sufficient condition for throughput optimality in very general settings. 

Recall that Qi (t) denotes the length of queue Qi at the beginning of time-slot t immediately after packet 
arrivals, Zi^i{t) denotes the delay of the l-th packet of Qi at the beginning of time-slot t, Wi{t) = ^t,i(t) 
denotes the HOL packet delay of Qi at the beginning of time-slot t, and Cij{t) denotes the connectivity 
between Qi and Sj in time-slot t. Let Sj{t) denote the set of queues being connected to server Sj in 
time-slot t, i.e., Sj{t) = {1 < i < n | Cij{t) = 1}, and let Tj{t) denote the subset of queues in Sj{t) 
that have the largest weight in time-slot t, i.e., Tj{t) = {i € Sj{t) \ Wi{t) = max;g5^(() Wi{t)}. We now 
present the main result of this section. 

Theorem 4: Let i{j,t) be the index of the queue that is served by server Sj in time-slot t, under a 
scheduling policy P. Under Assumption[T] policy P is throughput-optimal if there exists a constant M > 
such that, in any time-slot t and for all j G {1, 2, . . . , n}, queue Qi(j^t) satisfies that > Zr,M{t) 

for all r € Tj{t) such that Qrit) > M. 

We prove Theorem |4] using fluid limit techniques ||9l, lfT3l . and provide the proof in Appendix |E] The 
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condition in Theorem |4] means the following: In each time-slot, each server chooses to serve a queue 
with HOL packet delay no less than the delay of the M-th packet in the queue with the largest HOL 
delay (among the queues connected to the server); if this queue (with the largest HOL delay) has less 
than M packets, then the server may choose to serve any queue. 

It is well-known that the MaxWeight Scheduling (MWS) policy (H, S, lO-ini that maximizes the 
weighted sum of the rates, where the weights are queue lengths or delays, is throughput-optimal in very 
general settings, including the multi-channel system that we consider in this paper. The intuition behind 
Theorem |4] is that to achieve throughput optimality in our multi-channel system, it is sufficient for each 
server to choose a connected queue with a large enough weight such that this queue has the largest weight 
in the fluid limit. This relaxes the condition that each server has to find a queue with the largest weight 
in the original system, and thus significantly expands the set of known throughput-optimal policies. 

Next, we define the class of Maximum Weight in the Fluid limit (MWF) poUcies as follows. 

Definition 2: A policy P is said to be in the class of Maximum Weight in the Fluid limit (MWF) 
policies if policy P satisfies the sufficient condition in Theorem H] 

Clearly, the class of MWF policies are all throughput-optimal. It is claimed in Q that DWM policy 
is throughput-optimal, yet the throughput optimality was not explicitly proved there. For completeness, 
we state the following proposition on throughput optimality of DWM policy, and provide its proof in 
Appendix IF] 

Proposition 5: DWM policy is an MWF policy, and is thus throughput-optimal under Assumption [T] 
Next, we study a simple extension of the delay -based MaxWeight policy JH, H, ifTTl that is throughput- 
optimal in our multi-channel system. 

Delay-based MaxWeight Scheduling (D-MWS) policy: In each time-slot t, the scheduler allocates each 
server Sj to serve queue Qi(j^t) such that t) = min{i j i e rj{t)}. In other words, each server chooses 
to serve a queue that has the largest HOL delay (among all the queues connected to this server), breaking 
ties by picking the one with the smallest index if there are multiple such queues. 

It is easy to see that D-MWS policy is an MWF policy and is thus throughput-optimal. Also, it is 
worth noting that D-MWS policy has a low complexity of 0{v?) in our mutli-channel system. However, 
we can show that D-MWS suffers from poor delay performance. Specifically, we show in the following 
proposition that under D-MWS policy, the probability that the largest HOL delay exceeds any fixed 
threshold, is at least a constant, even if n is large. This results in a zero rate-function. 

Proposition 6: Consider i.i.d. Bernoulli arrivals, i.e., in each time-slot, and for each user, there is a 
packet arrival with probability p, and no arrivals otherwise. By allocating servers to queues according to 
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D-MWS, we have that 

lim sup — log P {W{0) >b) = Q, (3) 

n— >oo IT' 

for any fixed integer 6 > 0. 

We provide the proof in Appendix |G] The intuition behind Proposition [6] is the following. Note 
that under D-MWS, each server chooses to serve a connected queue having the largest weight without 
accounting for the decisions of the other servers. This way of allocating servers may incur an unbalanced 
schedule such that in each time-slot, with high probability, only a small fraction of the queues (O(logn) 
out of n queues) get served, while the number of queues having arrivals is much larger (0(n)). This then 
leads to poor delay performance. By an argument similar to that in Theorem 3 of ||4] (where the authors 
show that the Queue-length-based Max Weight Scheduling (Q-MWS) policy results in a zero queue-length 
rate-function), we can show that under D-MWS, the delay-violation event occurs with at least a constant 
probability for any fixed threshold even if n is large. 

We conclude this section with a summary of the scheduling policies proposed and/or discussed in this 
section. Although DWM policy is both throughput-optimal and rate-function delay-optimal, it results in an 
impractically high complexity. Another rate-function delay-optimal policy, FBS policy, is also impractical, 
as it needs to choose an appropriate operating parameter (that depends on the arrival processes) to achieve 
rate-function delay-optimal, and may not be throughput-optimal. Our analysis shows that our proposed 
DWM-n policy is rate-function delay-optimal and substantially reduces the complexity to 0(n^-^ log 
but it is not throughput-optimal either. Further, we show that a simple throughput-optimal policy, D-MWS 
policy, however, suffers from a zero rate-function. 

IV. Hybrid Policies 

It is clear from the previous section that a policy that satisfies the sufficient conditions in Theorems [T] and 
|4] is both throughput-optimal and rate-function delay-optimal. It remains however to find such a policy 
with a low complexity. Interestingly, our carefully chosen sufficient conditions possess the following 
special features, which allow us to construct a low-complexity hybrid policy that is both rate-function 
delay-optimal and throughput-optimal: 

• The sufficient condition for throughput optimality has a decoupling feature, in the sense that the 
condition can be separately verified for each individual server. 

• The sufficient condition for rate-function delay optimality guarantees not only rate-function delay 
optimality itself, but also that all scheduled servers for the n oldest packets satisfy the sufficient 
condition for throughput optimality. 
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Hence, by exploiting the above useful features of our sufficient conditions, we can now develop a class of 
two-stage hybrid OPF-MWF policies that runs an OPF policy (focusing on the n oldest packets only) in 
stage 1, and runs an MWF policy in stage 2 over the remaining servers (that are not allocated in stage 1) 
only. We will then show that all policies in this class of hybrid OPF-MWF policies are both rate-function 
delay-optimal and throughput-optimal. In particular, we can find simple OPF-MWF policies with a low 
complexity 0(n^'^ log n). 

We now formally define the class of two-stage hybrid OPF-MWF policies. 

Definition 3: A scheduling policy P is said to be in the class of hybrid OPF-MWF policies, if the 
following conditions are satisfied under policy P: In each time-slot t, there are two stages: 

1) in stage 1, it runs an OPF policy over the n oldest packets only; 

2) in stage 2, let R{t) denote the set of servers that are not allocated by the OPF policy in stage 1, 
and let t) be the index of the queue that is matched by server Sj for j G R{t) in stage 2. There 
exists a constant M > such that in any time-slot t and for all j G R{t), queue Qi[j^t) satisfies 
that Wi(j^t){t) > Zr,Mit) for all r G Tj{t) such that Qr{t) > M. In other words, it runs an MWF 
policy over the system with the remaining servers and packets. 

In the following theorem, we show that the class of OPF-MWF policies are both rate-function delay- 
optimal and throughput-optimal. 

Theorem 7: Any hybrid OPF-MWF policy is rate-function delay-optimal under Assumptions |2] and |3j 
and is throughput-optimal under Assumption [T] 

We provide the proof in Appendix iHl and give the intuition behind it as follows: 

• In stage 1, an OPF policy not only guarantees rate-function delay optimality, but also satisfies the 
sufficient condition for throughput optimality for all allocated servers in this stage. 

• The allocated servers and packets in stage 1 will not be considered in stage 2. In stage 2, we run 
an MWF policy for the remaining servers and packets only. Hence, it ensures that the sufficient 
condition for throughput optimality is satisfied for the remaining servers as well. Since the allocated 
servers and packets in stage 1 are not touched in stage 2, the satisfaction of the sufficient condition 
for delay optimality is not perturbed, and the sufficient condition for throughput optimality is also 
satisfied. 

We note that the idea of combining different policies into (heuristic) hybrid policies to improve the 
overall performance, is not new. However, since in this paper our goal is to attain provable optimality in 
terms of both throughput and delay, the task of designing the right hybrid policy becomes much more 
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challenging. Further, it is not necessary that all combinations of the OPF and MWF policies lead to 
desired hybrid policies. For example, it is unclear that the sufficient condition for throughput optimality 
can be satisfied if instead, we run an MWF policy in stage 1 and do post-processing by applying an 
OPF policy in stage 2. In this case, because the servers allocated by an MWF policy in stage 1 can 
be reallocated in stage 2, the sufficient condition for throughput optimality may not hold any more. 
In contrast, our solutions exploit the special features of our carefully chosen sufficient conditions, and 
intelligently combine different policies in a right way, to achieve the optimal performance for both 
throughput and delay. 

There are still many policies in the class of hybrid OFF-MWF policies. In the following, as an example, 
we show that DWM-n policy combined with D-MWS policy yields an 0(?i^'^ logn)-complexity hybrid 
OPF-MWF policy that is both throughput-optimal and rate-function delay-optimal. Let this policy be 
called DWM-n-MWS policy. Then, we present the main result of this paper in the following theorem. 

Theorem 8: DWM-n-MWS policy is in the class of hybrid OPF-MWF policies, and is thus both 
throughput-optimal and rate-function delay-optimal. Further, DWM-n-MWS policy has a complexity of 
0{'n?-^ logn). 

To show that DWM-n-MWS is a hybrid OPF-MWF policy, it suffices to show that Condition 2) of 
Definition [3] is satisfied. We provide the proof in Appendix |I] 

V. Simulation Results 

In this section, we conduct simulations to compare the performance of the scheduling policies proposed 
or discussed in this paper, where the Hybrid policy we consider is DWM-n-MWS policy. We also 
compare the delay performance of our proposed policies along with two queue-length-based policies 
(i.e., using queue lengths instead of delays to calculate weights when making scheduling decisions): the 
iLQF with PullUp algorithm (called iLQF for simplicity) and Q-MWS, which have been studied in 131, 
im. We implement and simulate these policies in Java, and compare the empirical probabiUties that the 
largest HOL delay in the system in any given time-slot exceeds a constant fe, i.e., P(W^(0) > b), where 
W{t) = maxi<i<„ Wi{t). 

For the arrival processes, we consider bursty arrivals that are driven by a two-state Markov chain and 
are thus correlated over time. (We obtained similar results for i.i.d. arrivals over time, but omit them 
here due to space constraints.) We adopt the same parameter settings as in Q. For each user, there are 5 
packet-arrivals when the Markov chain is in state 1, and no arrivals when the Markov chain is in state 2. 
The transition probability of the Markov chain is given by the matrix [0.5, 0.5; 0.1, 0.9], and the state 
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Markov-chain driven arrivals, b=2 
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Fig. 2. Performance comparison of different scheduling policies in the case with homogeneous i.i.d. channels. 



transitions occur at the end of each time-slot. The arrivals for each user are correlated over time, but 
they are independent across users. For the channel model, we first assume i.i.d. ON-OFF channels (as in 
Assumption m and set q = 0.75, and later consider more general scenarios with heterogeneous users and 
bursty channels that are correlated over time. We run simulations for a system with n G {10, 20, ... , 100}. 
The simulation period lasts for 10^ time-slots for each policy and each system. 

The results are summarized in Fig. |2] where the complexity of each policy is labeled. In order to 
compare the rate-function I{b) as defined in Eq. (|2]l, we plot the probability over the number of channels 
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Fig. 3. Performance comparison of different scheduling policies in the case with Markov-chain driven heterogeneous channels, 
for b=2. 



or users, i.e., n, for a fixed value of tiiresliold b. In Fig. |2(a)[ we compare tlie rate-function I{b) of different 
scheduling policies for h = 2. The negative of the slope of each curve can be viewed as the rate-function 



for the corresponding policy. From Fig. |2(a)| we observe that the Hybrid and DWM-n policies perform 
closely to DWM, and that D-MWS and Q-MWS have a zero rate-function, which supports our analytical 
results. Further, the results show that the delay-based policies (DWM, DWM-re and Hybrid) consistently 
outperform iLQF in terms of delay performance, despite the fact that iLQF is rate-function (queue-length) 
optimal IS, m. This provides further evidence of the fact that good queue-length performance does not 
necessarily translate to good delay performance. 

We also plot the probability over delay threshold b as in iBl-lll, Q to investigate the performance of 
different policies when n is small. In Fig. |2(b)[ we report the results for n = 10 and 6 e {0, 1, 2, ... , 29}. 
From Fig. |2(b) we observe that the Hybrid policy consistently performs closely to DWM for almost all 
values of b that we consider, while DWM-n is worse than DWM. This is because DWM-n may not 
schedule all the servers, and the probability that some of the servers are kept idle can be significant when 
n is small. 

Finally, we evaluate the performance of different scheduling policies in more realistic scenarios, where 
users are heterogeneous and channels are correlated over time. Specifically, we consider channels that can 
be modeled by a two-state Markov chain, where the channel is "ON" when the Markov chain is in state 1, 
and is "OFF" when the Markov chain is in state 2. This type of channel model can be viewed as a special 
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case of the Gilbert Elliot model that is widely used for describing bursty channels. We assume that there 
are two classes of users: users with an odd index are called near-users, and users with an even index are 
called far-users. Different classes of users see different channel conditions: near-users see better channel 
condition, and far-users see worse channel condition. We assume that the transition probability matrices of 
channels for near-users and far-users are [0.833, 0.167; 0.5, 0.5] and [0.5, 0.5; 0.167, 0.833], respectively. 
The arrival processes are assumed to be the same as in the previous case. Also, the delay requirements 
are assumed to be the same for different classes of users, i.e., we still consider the probability that the 
largest HOL delay exceeds a fixed threshold, without distinguishing different classes of users. 

The results are summarized in Fig. |3] We observe similar results as in the previous case, where channels 
are i.i.d. in time. In particular, our low-complexity policies (DWM-n and Hybrid) again perform closely 
to DWM, in terms of rate-function, although the delay-violation probability is a bit smaller under DWM 
when n is not large (i.e., n < 50), which is expected. Note that in this scenario, rate-function delay- 
optimal policies are not known yet. For future work, it would be interesting to explore whether our 
proposed policies can achieve optimality of both throughput and delay in more general scenarios. 

VI. Conclusion 

In this paper, we addressed the question of designing low-complexity scheduling policies that provide 
optimal performance of both throughput and delay in multi-channel systems. We derived simple and 
easy-to-verify sufficient conditions for throughput optimality and rate-function delay optimality, which 
allowed us to later develop a class of low-complexity hybrid policies that simultaneously achieve both 
throughput optimality and rate-function delay optimality. 

Our work in this paper leads to many interesting questions that are worth exploring in the future. 
It would be interesting to know if one can further relax the sufficient conditions, and design even 
simpler policies that can provide optimal performance for both throughput and delay. Further, it would be 
worthwhile to analytically characterize the fundamental trade-off between performance and complexity. 

Finally, it is very important to investigate the scheduling problem in more realistic scenarios, e.g., 
accounting for more general multi-rate channels that are correlated over time, rather than i.i.d. ON-OFF 
channels, and heterogeneous users with different statistics as well as different delay requirements. Our 
hope is to find efficient schedulers that can guarantee a nontrivial lower bound of the optimal rate-function, 
if it is too hard to achieve (or prove) the optimal delay performance itself in more general scenarios. 
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Appendix A 



The Optimal Throughput Region A* 



We can characterize the optimal throughput region A* of our multi-channel systems in a similar manner 
to that of single-channel systems in ||9l. 

We start with discussions for a single-channel system with n users in a more general setting. Specif- 
ically, suppose that there is a finite set A4 = {1,2, . . . ,\A4\} of global server states (where the server 
state accounts for the state of the links between the server and all users). For each state m € there 
is an associated service rate vector = [r™, 1 < i < n], where is the maximum number of packets 
that can be transmitted to Qi when the server is in state m (under Assumption |4] of ON-OFF channels, 
we have r™ G {0, 1} for all m and i). We assume that the random channel state process is a stationary 
and ergodic discrete-time Markov chain within the state space Ad. We let vr = [-Km, m G ^A] denote the 
stationary distribution of this Markov chain, where > for all m G A^. 

As in ||9l, consider a Static Service Split (SSS) policy, associated with an \^A\ x n stochastic matrix 

= [4'm,i,fn G 1 < i < n], where (j)m,i > for all m and i, and J2i<i<n't'rn,i = 1 for every m. 
Under the SSS policy, the server chooses to serve Qi with probability (pm,! when the server is in state m. 
Clearly, the (long-term average) service rate vector can be represented by = [ui, 1 < i < n] = z^(0), 
where = X^msAl 'T^m<Pm,,i'r^ ■ Then, the set of all feasible (long-term average) service rate vector can 
be represented as 



Now, consider a multi-channel system with n orthogonal channels. Let /Xj j denote the feasible (long- 
term average) service rate that can be allocated to queue Qi from server Sj, and let the vector = 
[/^ij, 1 < i < n] denote a feasible service rate allocation by server Sj. For each server Sj, the set of all 
such feasible vectors /ij is denoted by TZj. Note that the characterization of TZj has already accounted for 
the time- varying channel-states. Let = [^j, 1 < j < n] denote a feasible service rate matrix, and the 
set of all such feasible matrices ^ can be represented as 7^ = 7?-i x 712 x • • • x 7^„. Hence, the optimal 
throughput region A* can be represented as 



1Z= {v \ u = v{(f)) for some stochastic matrices (j)}. 



Hence, the optimal throughput region can be represented as 



A* = {A I A < for some vector v G IZ}. 



n 




fiij for all 



i, for some matrix /i G TZ}. 
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Note that our multi-channel system under Assumption |4] of ON-OFF channel model is a special case 
of the above scenario. 

Appendix B 
Proof of Theorem [H 

The proof follows a similar argument for proving rate-function delay optimality of DWM policy 
(Lemma 7 of [181). 

Consider a policy satisfying the sufficient condition in Theorem [T] say policy P, we want to show a 
dominance property of policy P over FBS policy. Specifically, for any given sample path and for any 
value of h, by the end of any time-slot t, policy P has served every packet that FBS has served. 

We start by reviewing the operations of FBS policy. FBS policy serves packets in units of frames. 
With a given positive integer h as the operating parameter, each frame is constructed such that: 1) The 
difference of arrival times of any two packets within a frame must be no greater than h; 2) The total 
number of packets in each frame is no greater than uq = n — Lh. In each time-slot, the packets arrived 
at the beginning of this time-slot are filled into the last frame until any of the above two conditions are 
violated. A new frame will be opened if any of the conditions are violated for the last frame. The packets 
in the queues with a smaller queue index are filled to the last frame with priority. FBS policy can serve 
the whole no oldest packets (that belong to the HOL frame) in each time-slots with high probability 
for large enough n, and does not serve any packets otherwise. We refer readers to Q for the detailed 
discussions about FBS policy. It has been shown that FBS policy with certain values of h is rate-function 
delay-optimal (Theorem 2 of fTl). 

Now, consider two queueing systems, Qi and Q2, both of which have the same arrival and channel 
realizations. We assume that Qi adopts policy P and Q2 adopts FBS policy. We define the weight of a 
packet p in time-slot t as its delay, i.e., w{p) = t — tp, where tp denotes the time when packet p arrives 
to the system. Note that different packets (in the same queue or in different queues) may have the same 
delays. In order to make each packet in the system have a unique weight, we redefine the weight of a 
packet p as = t — tp+ "^^^^ + j^^^^^^^, where qp denotes the index of the queue where packet 
p is queued and Xp denotes that packet p is the Xp-th arrival to queue Qp in time-slot tp. For two packets 
pi and p2, we say pi is older than p2 if w{pi) > w{p2)- It must be noted that as in ifTSl . we use weight 
w{-) instead of w{-) for ease of analysis only. 

Let Ri{t) represent the set of packets present in the system Qi at the end of time-slot t, for i = 1, 2. 
Then, it suffices to show that Ri{t) C R2{t) for all time t. We denote by A{t) the set of packets that 
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arrive at time t. Let Xi{t) denote the set of packets that depart the system Qi at time t, for i = 1,2. 
Hence, we have 

+ 1) = {Ri{t) U A{t + l))\Xi{t + 1), for i = 1, 2. 

We then proceed the proof by contradiction. Suppose that Ri{t) ^ R2{t) for some time t. Without 
loss of generality, we assume that r is the first time such that Ri{t) ^ R2{t) occurs. Hence, there must 
exist a packet, say p, such that p € Ri{t) and p ^ R2{t). Because r is the first time when such an event 
occurs, packet p must depart from the system Q2 in time-slot r, i.e., p E ^2 (''")■ 

Let Bi{v) denote the set of packets in -Rj(r — 1) U A{t) with weight greater than or equal to v, for 
2 = 1,2. Clearly, we have Bi{v) C B2{v) for all v, as i?i(r — 1) C R2{t — 1) by assumption. Since 
packet p is served in the system Q2 in time-slot r, we know from the operations of FBS that all packets 
in B2{w{p)) must also be served in time-slot r. This is because packet p is part of the HOL frame in 
time-slot r (as packet p is served in time-slot r), and all packets with a weight greater than w{p) must be 
filled to the frames with higher priority than packet p and should thus also belong to the HOL frame in 
time-slot r. This further implies that in the system Qi, there exists a feasible schedule that can match all 
packets in Bi{w{p)), since Bi{w{j>)) C B2{w{p)) and both systems have the same channel realizations. 

Now, from the sufficient condition in Theorem [T] policy P will serve all packets in Bi{w{p)), including 
packet p. However, this contradicts with the claim that packet p is not served (by policy P) in the system 
Qi in time-slot r (i.e., p ^ Ri{t)). 

So far, we have shown that for any given sample path and for any value of h, by the end of any 
time-slot t, policy P has served every packet that FBS has served. This completes the proof from the 
fact that FBS policy is rate-function delay-optimal. 

Appendix C 
Proof of Proposition [2] 

We first prove that DWM-n policy is an OPF policy and is thus rate-function delay-optimal. The proof 
follows immediately from a property of the MVM in bipartite graphs. We restate this property in the 
following lemma. 

Lemma 9 (Lemma 6 of ^1^): Consider a bipartite graph, and the k heaviest vertices, for some k. If 
there is a matching that matches all the heaviest k vertices, then any MVM matches all of them too. 

Since DWM-n policy finds an MVM in the constructed bipartite graph. Lemma |9] implies that for any 
A; G {1, 2, ... , n}, if the k oldest packets can be served by some scheduling policy, then DWM-?i policy 
can serve these k packets as well. This completes the first part of the proof. 
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Next, we prove that DWM-n policy has a complexity of 0(n^-^ log n). Note that in order to select 
the n oldest packets in the system, it is sufficient to sort the packets picked by DWM policy, i.e., 
the n oldest packets of each of the n queues, as no other packets can be among the n oldest packets 
in the system. The complexity of sorting packets |[T9l is 0{n? log n). Given the n oldest packets in 
the system, DWM-n policy constructs an n x n bipartite graph and finds an MVM ifTTI in 0(n^-^ logn) 
time. Hence, the overall complexity of DWM-n is 0(n^'^ log n), which completes the proof. 

Appendix D 
Proof of Proposition [3] 

The following simple counter-example shows that DWM-n cannot stabilize a feasible arrival rate vector, 
and is thus not throughput-optimal in general. 

Consider a system with two queues and two servers, i.e., a system with n = 2. We assume the i.i.d. 
ON-OFF channel model as in Assumption IH i.e., each server is connected to each queue with probability 
q G (0, 1), and is disconnected otherwise. In each time-slot, a server can serve at most one packet of a 
queue that is connected to this server. In such a system, the optimal throughput region can be described 
as A* = {A I Ai < 2q, A2 < 2q, and Ai + A2 < 2{2q — q'^)}, where the first two inequalities are obvious, 
and the last inequality is due to the following. For each of the two servers, the probability that at least 
one queue is connected to the server is 2q — q^, hence, the service each server can provide is 2q — q^, 
and the total (effective) capacity is thus 2{2q — q^). Note that any arrival rate vector A strictly inside the 
optimal throughput region A*, is feasible. 

Next, we construct an arrival process as follows. Consider a frame consisting of two time-slots. In 
each frame, there are packet arrivals to the system with probability p G (0, 1), and no arrivals otherwise. 
In a frame that has arrivals, there are K packet arrivals to queue Qi and no arrivals to queue Q2 in the 
first time-slot, and there are no arrivals to queue Qi and K packet arrivals to queue Q2 in the second 
time-slot, where we assume that K > 4. This type of arrival process yields an arrival rate vector of 
A* = It is easy to check that A* is feasible, if pK < Aq - 2q^. 

Now, we characterize an upper bound of the service rate under DWM-n policy. Recall that DWM-n 
considers only the n oldest packets in the system and maximizes the sum of the delays of the packets 
scheduled over these n packets, and no other packets will be scheduled. Hence, in each time-slot, DWM- 
n considers only the two oldest packets in the system. Consider any time-slot ti, where K — 1 out of 
the K packets arriving to queue Qi in the same time-slot are still waiting in the system. The other one 
packet could have been scheduled with a packet in Q2, or with a packet that arrived to Qi earlier, or it 
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could have been scheduled alone in a time-slot before ti. Note that the first K — 2 packets out of these 
K — 1 packets cannot be scheduled with packets in queue Q2, due to the operations of DWM-n. Hence, 
in any time-slot ^2 before these K — 1 packets are completely evacuated, each server must serve queue 
Qi if this server is connected to queue Qi, and no server will serve Q2 even if this server is connected 
to queue Q2, as the packets of Q2 are not among the two oldest packets in the system in such time-slot 
t2- Hence, the expected service rate for these K — 2 packets is 2q, and it thus takes -^^^ time-slots on 
average to evacuate the K — 2 packets. Similarly, it takes -^^^ time-slots on average to evacuate such 
K — 2 packets in queue Q2- Therefore, the total service rate of the system under DWM-n is no greater 
than 2K / {2{K -2) / {2q)) = 2qK/{K -2). It is clear that the system is unstable if the total arrival rate is 
greater than the total service rate, i.e., pK > 2qK/{K — 2). Then, by choosing p = 17/96, q = 1/2 and 
= 8, we obtain a feasible arrival rate vector A* that cannot be stabilized by DWM-n. This completes 
the proof. 

Appendix E 
Proof of Theorem |4] 

Suppose that the sufficient condition is satisfied under policy P, i.e., there exists a constant M > 
such that in any time-slot t and for all j € {1,2, . . . ,n}, queue Qi(j^t) satisfies that > Zr^M{t) 

for all r € ^j{t) such that Qr{t) > M. We want to show that policy P can stabilize any arrival rate 
vector A strictly inside the optimal throughput region A*. 

Recall that Qi{t) denotes the queue length of Qi at the beginning of time-slot t, Zi^i{t) denotes the 
delay of the /-th packet of Qi at the beginning of time-slot t, Wi{t) denotes the HOL delay of Qi at the 
beginning of time-slot t, and Ci,j{t) denotes the connectivity between queue Qi and server Sj in time-slot 
t. Let Yij{t) denote the service of queue Qi received from server Sj in time-slot t, i.e., Yij{t) = Cij{t) 
if server Sj is allocated to serve queue Qi, and Yij{t) = otherwise. We define the random process 
describing the behavior of the underlying system as A" = {X{t),t = 0, 1, 2, . . . ), where 

X{t) ^ Zi,2{t), . . . , ^.,Q,(t)(t)), l<i<n; 

Ci,j{t), 1 <i <n,l < j <n}. 

The norm of X{t) is defined as = X]i<i<nQ»(^) + X]i<j<n^«(*)- '^^^^ denote a process 

^ with an initial condition such that 

||;fW(o)|| = x. (4) 
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The following Lemma was derived in lEOl for continuous-time countable Markov chains, and it follows 
from more general results in ||2T] for discrete-time countable Markov chains. 

Lemma 10: Suppose that there exist a real number e > and an integer T > such that for any 
sequence of processes {X^^\xT),x = 1, 2, . . . }, we have 

-\\X^''\xT)\\ 



lim sup E 



< 1 - e, (5) 



then the Markov chain X is stable. 

Lemma [10] implies the stability of the network, and a stability criteria of type (jS) leads to a fluid limit 
approach [|T3l to the stability problem of queueing systems. 

In the following, we construct the fluid limit model of the system as in iQ, lfT3l . We assume that 
the packets present in the system in its initial state X^^\^) arrived in some of the past time-slots 
— {x — 1), — (x — 2), ... ,0, according to their delays in state X{{)). We define another process y = 
{A,Q,W,Y), i.e., a tuple that denotes a list of process, and clearly, a sample path of 3^^^^ uniquely 
defines the sample path of X^^\ Then, we extend the definition of y to each continuous time t > as 
3;(^)(t) = 3^('')([tJ), where \t\ denotes the integer part of t. 

Next, we consider a sequence of processes {^y^^"'\xra-)} that are scaled in both time and space. 
Then, using the techniques of Theorem 4.1 of |[T3l or Lemma 1 of |[9l, we can show that for almost all 
sample paths and for any sequence of processes {^y^^'^^Xm-)} , where {x^} is a sequence of positive 
integers with Xm — > oo, there exists a subsequence {xm, } with x^, — )• oo as / ^ oo such that the 
following convergences hold uniformly over compact (u.o.c.) interval: 



1 r"^'' Af'-'\T)dT ^ X,t, (6) 
""'V/;'"')(r)(ir^ fy^J{T)dT, (7) 

mi Jo ' Jo 

—Q'f'^'\xmit)^qiit). (8) 

Xmi 

Similarly, the following convergences (which are denoted by "=>") hold at every continuous point of the 
limiting function Wi{t): 

— W^''"''\xra,t) ^ Wi{t). (9) 
Xrrii 

Any set of limiting functions {q, y, w) is called a. fluid limit. It is easy to show that the limiting functions 
are Lipschitz continuous in [0, oo), and are thus absolutely continuous. Therefore, these limiting functions 
are differentiable at almost aU (scaled) time t G [0, oo), which we caU regular time. Moreover, the limiting 



Xmi Jo 
1 

X. 
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and that 



d , , 



functions satisfy that 

1 l<i<n 

Ai - Ej?/i,j(0- gi(i)>0, ^^^^ 

, (Ai-EiyMW)+. 9i(i) = o. 

We then prove the stability of the fluid limit model using a standard Lyapunov technique. We consider a 
quadratic Lyapunov function in the fluid limit model of the system, and show that the Lyapunov function 
has a negative drift when its value is greater than 0, which implies that the fluid hmit model is stable. 

Using a similar argument as in |j8], ||9], we can show that under policy P, there exists a finite time 
Ti > such that for all t>Ti, we have 

qi{t) = XiW^{t) (12) 

for all i. This linear relation is similar to the Little's law and plays a key role in proving stability of the 
delay-based schemes. We omit the proof of this linear relation for brevity and refer readers to |j8], |0. 

Let V{q{t)) denote the Lyapunov function defined as 

1 " 2/'J.^ 

i=l * 

Suppose that A is strictly inside A*, then there exists a vector fi £ TZ such that Aj < Ej=i /"jj 
i. Let /3 denote the smallest difference between A^ and Ej=i/"«,i' i-^-' — ™iiii<j<n(Ej'=i ~ Aj)- 
Clearly, we have /3 > 0. It suffices to show that for any (i > 0, there exist a. (2 > and a finite 
time T2 > such that for aU regular time t > T2, V{q{t)) > (i implies ^V{q{t)) < -(2, where 
^V{q{t)) = lim^^o . Choose any T2 > Ti. Since q{t) is differentiable for aU regular 
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(14) 



time t > T2 such that V{q{t)) > 0, we can obtain the derivative of V{q{t)) as 

i=l * j=l 

n n 

i=i j=i 

n n n 

1=1 j=i j=i 

n n 

i=l j = l 

n / n n \ 

j=l \i=\ i=\ J 

where (a) is from ([TTI i. and (b) is from ([T2] | along with a little algebra. 

From ([T2I 1 and ([T3] ). we can choose (^3 > such that V{q{t)) > Ci implies maxi<j<„ > ^3. 
Then, in the final result of ([T4l ). we can conclude that the first term is bounded. That is, 

n n 

i=i j=i 

n 

< 



Cs min (V^ij(t) - Xi) 

l<i<n ^ — ' 



< -C3/3 
= -C2 < 0. 

Therefore, we have that ^V{q{t)) < —Q2 if the second term in the final result of (fT4] | is non-positive. 
We show this in the following. 

Considering the neighborhood around a fixed (scaled) time t > T2, we define N = {\xmit\, \xmit \ + 
1, . . . , [xm, {t+Si)\ }, where 5 is a small positive number and {xmi } is a positive subsequence for which the 
convergence to the fluid limit holds. We will omit the superscript (xm, ) of the random variables (depending 
on the choice of the sequence {x^, }) throughout the rest of the proof for notational convenience (e.g., 

(x ) 

we use Qi{t) to denote {€)). We want to show that under policy P, in each time-slot t £ N, each 

server Sj serves a connected queue Qi(j^T) having the largest weight in the fluid limits, i.e., Wi(^j^T--j{t) = 
Lj{T) = maXjg5^(T-) Wi{t) (recall that Sj{T) = {I < i < n \ Cij{T) = 1}). Note that the trivial statement 
holds if <Sj(t) = or Lj^t) = 0. Hence, suppose that iSj(r) / and Lj{T) > 0. Consider r, s G <Sj{T) 



January 17, 2013 



DRAFT 



26 



such that Ws{t) = Lj{T) and Wrir) = maxjg^^^^) 14^j(r). In other words, Qs is a queue having the 
largest weight in the fluid limit among all the queues being connected to server Sj in time-slot r, and 
Qr is a queue having the largest weight in the original discrete -time system among all the queues being 
connected to server Sj in time-slot r. Note that it is possible that r = s. Then, for any time-slot r G A^, 
we have that 

(a) 



> 



Wi(j,r){r) - {[Xmi (t + S)\ - \Xmit]) 



(b) 

> Zr,M{T) - {[Xm,{t + 6)\ - 

> Zr,M{T)-Wr{T)+Wr{T) - {[Xm,{t + 6)\ - [x^,*]) (15) 

> Zr,M{r) - Wr{T) + Ws{t) - ( [x^, (t + 6)\ - \Xm,t]) 
(d) 

> Zr,M{T) - Wr{T) + Ws{[Xmi{t + 6)\) 

-2{lxm,{t + 6)\ - \x„,^t]), 
where (a) and (d) are due to the fact that the HOL delay cannot increase by more than [xm, {t+S)\ — \xmi t] 
within [xmi{t + S)\ — \xm,t\ time-slots, (b) is from the property of policy P satisfying the sufficient 
conditions, and (c) is due to Wr{T) = maxjg5^ (^) Wj(r) and s € Sj{T). Divide both sides of the final 
result of the above equation by x^i and let x^, goes to infinity, we have that 

U\ ('^) V ^i{j,T){Xm,t) 



f li,n ^^^ilW^i?^ + + (16) 

ws{t + 5) - 26, 

where (a) is from the definition of fluid limits, (b) is from (flSl ) and lim^;^^ _j.oo L^"' ^^'^^^^ — [fw^ _ 
and (c) is because lim^^^^oo ^"'"^L!"^"^^'' = 0' *e SLLN of O holds and Zr,A/(r) - Wrir) is the 
difference of the arriving times of two packets having finite number of packets in-between. Since the 
above equation holds for any arbitrarily small positive number 6, by letting 5 go to on both sides of 
the final result of the above equation, we have w^(^j^^-^{t) > Ws{t) = Lj{T), and in particular, we have 
'^i{j,T){t) = Lj{T). This is true for each j and for each t £ N. Therefore, under policy P, the service 
vector y{t) satisfies that 

n n 

Wi{t)yij{t) = max'^Wi{t)i^ij, 

i=l '^^ i=l 
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for all j e {1,2, . . . ,n}. 
Thus, we have that 

y{t) G argmax^^«;i(t)i/ij, (17) 

which implies that 



•"^T^ j=li=l 



5] < J] J]u;i(t)yi,,(t). (18) 

j=l i=l j=l i=l 

Therefore, this shows that V{q{t)) > Ci implies ^V{q{t)) < —(2 for all t > T2. It immediately 
follows that for any C > 0, there exists a finite T > T2 > such that X]i<j<n ^iC-^) — C- Further, we 
have that 

J2 iQ^iT) + w,{T)) < (1 + ■ ^ . )C 

l<i<n — — 

due to the linear relation (fT2] |. 

Now, consider any fixed sequence of processes {X^^\x = 1,2,...} (for simplicity also denoted by 
{x}). From the convergences dD-®, we have that for any subsequence {xm} of {x}, there exists a 
further (sub)subsequence {xm, } such that 

lim ^|i;f(--)(x„,r)|| 

j^oo Xmi 

= {q,{T) + w,{T)) < (1 + . ^ , )C 

l<i<n 

almost surely. This in turn implies (for small enough that 

lim -||Af(^)(xr)|| < (1 + — i -)C ^ 1 - e < 1 (19) 

x^oo X mmi<j<„ \i 

almost surely. 

We can show that the sequence { j||A'(^)(xT)||, x = 1,2,...} is uniformly integrable, due to the 
following: 

1 1 fxT 

and 

1 

E[l + - V / Ai{T)dT + nT]< 00, 

where the above finite expectation is from our assumption on the arrival process. Then, the almost surely 
convergence in ([T9l ) along with uniform integrability implies the following convergence in the mean: 

limsupE[-||A'(^)(xr)||] < 1 - e. 

Since the above convergence holds for any sequence of processes {X^^\xT),x = 1,2,...}, the 
condition of type ^ in Lemma [10] is satisfied. This completes the proof of Theorem |4] 
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Appendix F 
Proof of Proposition [5] 

We prove it by showing that DWM is an MWF policy. 

Let M = n. We want to show that the sufficient condition in Theorem |4] is satisfied, i.e., in any 
time-slot t and for all j G {1, 2, . . . , n}, DWM policy allocates server Sj to serve queue Qi{j^t)^ which 
satisfies that > Zr.n{t) for all r G Tj{t) such that Qr{t) > n. 

Suppose that the sufficient condition is not satisfied, i.e., consider any server Sj such that Qr{t) > n 
for some r € ^j{t), and Sj is allocated to serve queue Qi{j^t)^ ^rid suppose that < Zj.^n{t)- 

Since Qr{t) > n and at most n — 1 packets could be matched with the other n — 1 servers, there must be 
at least one of the n oldest packets in Qr remaining unmatched. Suppose this packet is the k-th oldest 
packet in queue Qr, then Zr^k{t) > Zr^n{t) > ^i{j,t){'t)- Hence, DWM policy must match Sj to the 
k-th oldest packet in queue Qr, i.e., DWM must allocate Sj to serve Qr rather than Qi{j^t)^ which is a 
contradiction. 

Therefore, DWM policy is an MWF policy and is thus throughput-optimal. 

Appendix G 
Proof of Proposition [6] 

By an argument similar to that in Theorem 3 of ||4], we want to show that under D-MWS, the delay- 
violation event occurs with at least a constant probability for any fixed delay threshold even if n is 
large. 

Fix any integer T, and consider any configuration of queues at the end of time-slot T. Also, fix any 
real number p' E (0,p). 
In time-slot T + 1: 

By the Chernoff bound, there exists an integer A^i such that for all n > Ni, with probability at least 
1 - e-^(P'IIP)", at least np' queues have packet arrivals at the beginning of time-slot T + 1. Define 
fi = — 2/log(l — q) and u = /ilogn. Fix an integer A^2 such that for all n > N2, we have u > 1. 
Sort the queues in the order of priority for service under D-MWS, i.e., after sorting, the first queue 
has the largest weight (HOL delay) with the smallest index; the second queue has the largest weight 
with the second smallest index, or has the second largest weight with the smallest index if there is 
only one queue having the largest weight; and so on. Let the set of the first u queues after sorting be 
Q* = {Qj^ ^ Q^^ , • • • , }■ Let Ej denote the event that server Sj is not connected with any of the queues 
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in Q*. Then, P{Ej) = (1 - q)" = (1 - q)''^°^", and we have that 




fi log n 



n 



(20) 



where the last equality is because (1 — q)i^^°sn = exp(;ulognlog(l — q)) = Thus, with probability at 
least 1 — ^, each server is connected to at least one queue in Q*. According to the operations of D-MWS, 
a server connected to at least one queue in Q* must be allocated to one of the queues in Q* . Hence, 
with probability at least 1 — all the servers serve queues in Q* . Since \ Q*\ = v and with probability at 
least l-e--^(P'll?'>", at least np' queues had packet arrivals, it follows that for n > A'^a = maxjA^i, A^2}> 
with probability at least 1 — ^ — e'^^^'ll^)" by the union bound, at the end of time-slot T + 1 (and at 
the beginning of time-slot T + 2), the system has at least np' — v queues having a weight (HOL delay) 
of at least 1. Let this set of queues (of weight being 1) be A\. 
In time-slot T + 2: 

By the similar argument above for time-slot T + 1, it follows that, with probability at least 1 — no 
more than v queues can receive service. Combining this with the result for time-slot T + 1 and using 
the union bound, we have that for all n > N^, with probability at least 1 — ^ — e"^^^ H^)", at the end 
of time-slot T + 2 (and at the beginning of time-slot T + 3), there exists a set A2 of queues such that 
1-42 1 > \ \ — v > np' — 2v, and each queue in A2 has a weight (HOL delay) of at least 2. 

Repeating the same argument above, we have that for all n > N-^, with probability at least 1 — — 
^-D{p'\\p)n^ at the end of time-slot T + b + 1, there exists a set Ab+i of queues such that |^f,+i| > 
np' — {b + l)u, and each queue in Ab+i has a weight (HOL delay) of at least 6+1. 

Fix a real number e € (0, 1), there exists an integer such that for all n > A^4, we have 1 — 
— e~^^PH^^" > e and np' — (6 + l)u = np' — (6 + l)/xlogn > 1. Hence, for a system with 
n > = max{A^3, A^4}, starting with time-slot T, with probability at least e, we have at least one 
queue having a HOL delay of at least 6 + 1 at the end of time-slot T + 6 + 1 (or at the beginning of 
time-slot T + b + 2). Let T = —b — 2, then the above result shows that the delay violation event occurs 
with at least a constant probability even if n is large. This completes the proof. 



We first show that a hybrid OPF-MWF policy is an (overall) OPF policy and is thus rate-function 
delay-optimal. Note that in stage 1 , the operations of an OPF policy already guarantees that the sufficient 
condition in Theorem [T] is satisfied. Since in stage 2, the matched servers and packets in stage 1 will not 
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be considered, it ensures that the operations do not perturb the satisfaction of the sufficient condition for 
rate-function delay optimality. 

In the following, we want to show that a hybrid OPF-MWF policy is an (overall) MWF policy and 
is thus throughput-optimal. Let M = n. We want to show that the sufficient condition in Theorem |4] is 
satisfied, i.e., in any time-slot t and for all j € {1, 2, ... , n}, a hybrid OPF-MWF policy allocates server 
Sj to serve queue Qi(j^t)^ which satisfies that VFj(j j)(t) > Zr^n{t) for all r e ^j{t) such that Qr{t) > n. 

First, we want to show that in stage 1, an OFF policy also guarantees that all allocated servers in 
stage 1 satisfies the sufficient condition for throughput optimality. Consider each server Si such that 
/ G {1,2, . . . ,n}\R{t), i.e., all servers Sj that are allocated in stage 1. Then, Qi(i^t) is the queue served 
by server 5; in stage 1 of time-slot t. Since we run an OFF policy in stage 1, server 5; serves a packet 
among the n oldest packets in the system, and it must satisfy that > Zr^n{t) for any r e Ti{t) 

such that Qr{t) > n. 

Next, consider each server Sj such that j € R{t), then Qi{j^t) is the queue served by server Sj in stage 
2 of time-slot t. It is clear from Condition 2) of Definition |3] that > Zr^n{t) for all r G ^j{t) 

such that Qr{t) > n. 

Therefore, a hybrid OFF-MWF policy is an (overall) MWF policy and is thus throughput-optimal. 

Appendix I 
Froof of Theorem [8] 

To show that DWM-n-MWS is a hybrid OFF-MWF pohcy, it is sufficient to show that Condition 2) 
of Definition [3] is satisfied. 

Given any time-slot t, consider each server Sj such that j G R{t), then Qi(j^t) is the queue served 
by server Sj in stage 2 under D-MWS. Let M = n. We want to show that Wi(^j^t){t) > ■Z'r,ri(i) for all 
r G Tj{t) such that Qrit) > n. 

Let W[{t) be the HOL delay of queue Qi at the beginning of stage 2. Let r^(t) denote the set of 
queues that are connected to server 5^ and have the largest weight among the connected queues at 
the beginning of stage 2 of time-slot t, i.e., r^(t) = G Sj{t) \ = max^g^^.^^^ where 

Sj{t) = {1 < i < n I Cij{t) = 1}. According to the operations of D-MWS, the index of queue that is 
served by server Sj satisfies that i{j,t) = min{i | i G T'j{t)}, hence, we have W-^j^^{t) = Wl{t) for 
any r G T'j{t). This implies that Wj(j-f)(t) > W^^.^^{t) = W^{t) > Zr,n{t) for any r G T'-{t) such that 
Qr{t) > 11, where the last inequality is because Qr{t) > n and thus the HOL packet of queue Qr at the 
beginning of stage 2 must not have a later position than the n-th packet in queue Qr at the beginning 
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of time-slot t. This holds for all j € R{t) and any time-slot t. Therefore, DWM-7t,-MWS is a hybrid 
OPF-MWF policy. 

Since the complexity of DWM-n and D-MWS is 0{n?-^\ogn) and O(n^), respectively, the overall 
complexity of DWM-n-MWS policy is 0{v?'^ \ogn). 
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